Predictive Maintenance



Predictive maintenance report by Cecil V. and Alonso M.. Repository available in github.

Introduction


The mining industry is one of the biggest industries in need of a large budget, and current changes in global economic challenges force the industry to reduce its production expenses. One of the biggest expenditures is maintenance. Thanks to the data mining techniques, available historical records of machines’ alarms and signals might be used to predict machine failures. This is crucial because repairing machines after failures is not as efficient as utilizing predictive maintenance.

Predictive Maintenance (PDM) has a significant difference from the other maintenance types. During regular operation to reduce failures, PDM directly monitors the status and performance of equipment and provides an opportunity to take precautions before machine failures. Although PDM is more complex compared to the others, itprovides several advantages thanks to monitoring. PDM reduces maintenance cost, unnecessary preventative maintenance, unplanned maintenance and provides more efficient work.

This report summarizes the work done and the results of using machine learning and data science techniques to acquire, clean, and analyze CAEX truck engines sensors measurements, and the predictions results generated by different machine learning algorithms.

Approach

By exploring CAEX truck engine’s sensor values over time, machine learning algorithm can learn the relationship between sensor values and changes in sensor values to the historical failures in order to predict failures in the future. Supervised machine learning algorithms will be used to make the following predictions:

  • Use of Regression algorithms to predict engine Time To Failure (TTF)

  • Use of Binary Classification algorithms to predict if the engine will fail in this period

  • Use of Multiclass Classification algorithms to predict the period an engine will fail

Development Environment

Work done simultaneously in R and Python through the Reticulate package.

Data Generation

Data acquisition

Files contain CAEX engine run-to-failure events, operational settings, and sensors measurements were provided by SPECTO Cummins which were resampled every 15 minutes or 1 cycle. The training data contains CAEX engines’ run-to-failures in the months of January, February, March, July, August, September which gives a total amount of 648 failures. The test data contains CAEX engines’ operating data from the last round of failures for each month in September.

Generated features from SAP Data:

Failures and maintenance target an specific sub sistem component. This is a variable which exist in the SAP and analyzed in var_dict.R. These subsistems where grouped into 6 components in order to improve the accuracy.

  • fail_SS & fail_code: Corresponds to the failed sub-subsistem component as character or encoded, (Predictited by a multiclassifier)

  • maint_SS & maint_code: Corresponds to the mantained sub-subsistem as character or encoded. This feature is generated from SAP data. It is a NaN column takes an encoded value if the engine is maintained whithin the closest cycle.
  • Features ending in “_SD” or “_AV” are generating by the moving standart deviation and average for each sensor.

Generated features from Motor Sensor data:

  • RUL: Reamaining Usefull Life in cycles. Also called as Time To Failure (TTF). This variable is predicted by a Regression algorithm.

  • RUL_CLASS: Factor variable representing the time window of when the engine will fail. “0, 1 and 2” represent that the engine is respectively above 24 hours (96 cycles), below 24 hours and below 8 hours of a failure. This variable is predicted by a Multiclassification algorithm.

  • LIFETIME: Time whithin failures in cycles. Variable used only for variable preprocessing.

  • LAG_RUL: Correponds to the Previous LIFETIME of the CAEX truck.

Modelling Approach

By exploring CAEX engine’s sensor values over time, machine learning algorithm can learn the relationship between sensor values and changes in sensor values to the historical failures in order to predict failures in the future. The following superrvised machine learning algorithms will be used to make predictions:

Classification: if the engine will fail in this period.

  • Given: Motor Sensor Data and Features generated from taking the moving average and standart deviation and other miscellaneous features.

  • Use: StratifiedGroupKFold Cross Validated eXtreme Gradient Boosting. We assume that class 0 (no failure) is imbalanced.

  • To: Predict if the CAEX truck is whithin a 24 hours time window of it’s failure [“RUL_CLASS”]. Our main output is an alarm that triggers if the probability of failure is above a threshold for 4 cycles (1 hour).

  • TODO: Improve probability estimation, more insight given in the code.

Regression: Predict Reamaining Usefull Life.

  • Given: Motor Sensor Data and Features generated from taking the moving average and standart deviation and other miscellaneous features:

  • Use: StratifiedKFold Light Gradient boosting or Long Short Term Memory. We assume that the engine has more than 24 hours of RUL.

  • To: Predict the remaining usefull life (RUL) or time to failure of the CAEX.

  • TODO: Rewrite code using LSTM.

Classification : Predict which subsistem will fail.

  • Given: Motor Sensor Data and Features generated from taking the moving average and standart deviation and other miscellaneous features:

  • Use: StratifiedGroupKFold Cross Validated eXtreme Gradient Boosting. We assume that the engine is within 24 hours of failure.

  • To: Predict which subsistem will fail within in the last 24 hours of life of the CAEX truck [“fail_code”].

  • TODO: An study of how the probability of a particular engine failure is increasing over time.

Summary and Next Steps

The project tried to answer three essential questions in predictive maintenance: When an engine will fail? Which subsistem fail in this period? Will the engine fail in this period?

By applying machine leaning regression, binary classification, and multiclass classification algorithms respectively, to historical data of engines sensors, the project was able to provide some suggestions responding to the problem. Since predicting TTF is critical to all kinds of modeling performed in this project, more work is required to enhance regression performance. This could be by fixing data (outliers, resampling etc.), trying other models, or tuning models parameters and adding mine data.

Features selection and dimensionality reduction techniques should also be utilized to enhance models performance and speed. Neural Net and SVM with RBF kernel required extensive computation and time.

Finally, the selected model in each category should be deployed for online accessibility.

 




A work by Cecil V., Alonso M.